Picture for Yuxin Guo

Yuxin Guo

From Single Scan to Sequential Consistency: A New Paradigm for LIDAR Relocalization

Add code
Feb 03, 2026
Viaarxiv icon

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Add code
Jan 08, 2026
Viaarxiv icon

Klear: Unified Multi-Task Audio-Video Joint Generation

Add code
Jan 07, 2026
Viaarxiv icon

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Add code
Oct 09, 2025
Viaarxiv icon

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Add code
Aug 27, 2025
Figure 1 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 2 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 3 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Figure 4 for AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Viaarxiv icon

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Add code
May 26, 2025
Viaarxiv icon

Parallel Layer Normalization for Universal Approximation

Add code
May 19, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Figure 1 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 2 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 3 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 4 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Viaarxiv icon

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Add code
Mar 25, 2025
Viaarxiv icon

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

Add code
Feb 20, 2025
Viaarxiv icon